Search CORE

225 research outputs found

Automatic detection of borrowing (Open problems in computational diversity linguistics 2)

Author: Johann-Mattis List
Publication venue: Blogger
Publication date: 01/01/2019
Field of study

This is the third of a series of 12 blog posts published in 2019, discussing open problems in computational diversity linguistics. It discusses the problem of automatic borrowing detection

Humanities Commons

Automatic sound law induction (Open problems in computational diversity linguistics 3)

Author: Johann-Mattis List
Publication venue: Blogger
Publication date: 01/01/2019
Field of study

This is the fourth of a series of 12 blog posts published in 2019, discussing open problems in computational diversity linguistics. It discusses the problem of automatic sound law induction

Humanities Commons

A Computational Model for the Assessment of Mutual Intelligibility Among Closely Related Languages

Author: List Johann-Mattis
Nieder Jessica
Publication venue
Publication date: 05/02/2024
Field of study

Closely related languages show linguistic similarities that allow speakers of one language to understand speakers of another language without having actively learned it. Mutual intelligibility varies in degree and is typically tested in psycholinguistic experiments. To study mutual intelligibility computationally, we propose a computer-assisted method using the Linear Discriminative Learner, a computational model developed to approximate the cognitive processes by which humans learn languages, which we expand with multilingual semantic vectors and multilingual sound classes. We test the model on cognate data from German, Dutch, and English, three closely related Germanic languages. We find that our model's comprehension accuracy depends on 1) the automatic trimming of inflections and 2) the language pair for which comprehension is tested. Our multilingual modelling approach does not only offer new methodological findings for automatic testing of mutual intelligibility across languages but also extends the use of Linear Discriminative Learning to multilingual settings.Comment: To appear in: Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP (SIGTYP 2024

arXiv.org e-Print Archive

Bangime: secret language, language isolate, or language island? A computer‐assisted case study

Author: Hantgan Abbie
List Johann‐Mattis
Publication venue: 'Edinburgh University Library'
Publication date: 07/09/2022
Field of study

We report the results of a qualitative and quantitative lexical comparison between Bangime and neighboring languages. Our results indicate that the status of the language as an isolate remains viable, and that Bangime speakers have had different levels of language contact with other Malian populations at various points throughout their history. Bangime speakers, the Bangande, claim Dogon ancestry. The Bangande portray this connection to Dogon through the fact that the language has both recent borrowings from neighboring Dogon varieties and more rooted vocabulary from Dogon languages spoken to the east from whence the Bangande claim to have come. Evidence of multilayered long‐term contact is clear: lexical items have even permeated even core vocabulary. However, strikingly, the Bangande are seemingly unaware that their language is not intelligible with any Dogon variety. We hope that our fiindings will influence future studies on the reconstruction of the Dogon languages and other neighboring language varieties to shed light on the mysterious history of Bangime and its speakers

Papers in Historical Phonology

Journal Hosting Service | The University of Edinburgh

MPG.PuRe

Trimming Phonetic Alignments Improves the Inference of Sound Correspondence Patterns from Multilingual Wordlists

Author: Blum Frederic
List Johann-Mattis
Publication venue
Publication date: 31/03/2023
Field of study

Sound correspondence patterns form the basis of cognate detection and phonological reconstruction in historical language comparison. Methods for the automatic inference of correspondence patterns from phonetically aligned cognate sets have been proposed, but their application to multilingual wordlists requires extremely well annotated datasets. Since annotation is tedious and time consuming, it would be desirable to find ways to improve aligned cognate data automatically. Taking inspiration from trimming techniques in evolutionary biology, which improve alignments by excluding problematic sites, we propose a workflow that trims phonetic alignments in comparative linguistics prior to the inference of correspondence patterns. Testing these techniques on a large standardized collection of ten datasets with expert annotations from different language families, we find that the best trimming technique substantially improves the overall consistency of the alignments. The results show a clear increase in the proportion of frequent correspondence patterns and words exhibiting regular cognate relations.Comment: The paper was accepted at the SIGTYP workshop 2023 co-located with EAC

arXiv.org e-Print Archive

Statistical proof of language relatedness (Open problems in computational diversity linguistics 7)

Author: Johann-Mattis List
Publication venue: Blogger
Publication date: 01/01/2019
Field of study

This is the eighth of a series of 12 blog posts published in 2019, discussing open problems in computational diversity linguistics. It discusses the problem of statistical proof of language relatedness

Humanities Commons

Formal and quantitative approaches to historical language comparison

Author: Johann-Mattis List
Publication venue: 'Modern Language Association'
Publication date: 01/01/2022
Field of study

Lecture, given at the Fifth Pavia International Summer School for Indo-European Linguistics (Università di Pavia, 2022-09-05/09

Humanities Commons

Save the trees

Author: Johann-Mattis List
Publication venue: 'Modern Language Association'
Publication date: 01/01/2019
Field of study

Skepticism regarding the tree model has a long tradition in historical linguistics. Although scholars have emphasized that the tree model and its long-standing counterpart, the wave theory, are not necessarily incompatible, the opinion that family trees are unrealistic and should be completely abandoned in the field of historical linguistics has always enjoyed a certain popularity. This skepticism has further increased with the advent of recently proposed techniques for data visualization which seem to confirm that we can study language history without trees. In this article, we show that the concrete arguments that have been brought up in favor of achronistic wave models do not hold. By comparing the phenomenon of incomplete lineage sorting in biology with processes in linguistics, we show that data which do not seem as though they can be explained using trees can indeed be explained without turning to diffusion as an explanation. At the same time, methodological limits in historical reconstruction might easily lead to an overestimation of regularity, which may in turn appear as conflicting patterns when the researcher is trying to reconstruct a coherent phylogeny. We illustrate how, in several instances, trees can benefit language comparison, although we also discuss their shortcomings in modeling mixed languages. While acknowledging that not all aspects of language history are tree-like, and that integrated models which capture both vertical and lateral language relations may depict language history more realistically than trees do, we conclude that all models claiming that vertical language relations can be completely ignored are essentially wrong: either they still tacitly draw upon family trees or they only provide a static display of data and thus fail to model temporal aspects of language history

Humanities Commons

Sequence Comparison in Historical Linguistics

Author: List Johann-Mattis
Publication venue: düsseldorf university press
Publication date: 01/01/2014
Field of study

Memòria Digital de Catalunya

Düsseldorf University Press (d|u|p)

Multiple sequence alignment in historical linguistics. A sound class based approach

Author: Johann-Mattis List
Publication venue: 'Modern Language Association'
Publication date: 01/01/2012
Field of study

In this paper, a new method for multiple sequence alignment in historical linguistics is presented. The algorithm is based on the traditional framework of progressive multiple sequence alignment (cf. Durbin et al. 2002:143-149) whose shortcomings are further enhanced by (1) a sound class representation of phonetic sequences (cf. Dolgopolsky 1986, Turchin et al. 2010) accompanied by specific scoring functions, (2) the modification of gap scores based on prosodic context, (3) a new method for the detection of swapped sites in already aligned sequences. The algorithm is implemented as part of the LingPy library (http://lingulist.de/lingpy), a suite of open source Python modules for various tasks in quantitative historical linguistics. The method was tested on a benchmark dataset of 152 manually edited multiple alignments covering data for 192 Bulgarian dialects (Prokić et al. 2009). The results show that the new method yields alignments which differ only in 5 % of all sequences from the gold standard

Humanities Commons